Abstract:Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models.
Abstract:We propose a method called ideal regression for approximating an arbitrary system of polynomial equations by a system of a particular type. Using techniques from approximate computational algebraic geometry, we show how we can solve ideal regression directly without resorting to numerical optimization. Ideal regression is useful whenever the solution to a learning problem can be described by a system of polynomial equations. As an example, we demonstrate how to formulate Stationary Subspace Analysis (SSA), a source separation problem, in terms of ideal regression, which also yields a consistent estimator for SSA. We then compare this estimator in simulations with previous optimization-based approaches for SSA.
Abstract:Detecting changes in high-dimensional time series is difficult because it involves the comparison of probability densities that need to be estimated from finite samples. In this paper, we present the first feature extraction method tailored to change point detection, which is based on an extended version of Stationary Subspace Analysis. We reduce the dimensionality of the data to the most non-stationary directions, which are most informative for detecting state changes in the time series. In extensive simulations on synthetic data we show that the accuracy of three change point detection algorithms is significantly increased by a prior feature extraction step. These findings are confirmed in an application to industrial fault monitoring.